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Abstract- Automatic face recognition 
has ample significance in biometric 
research. Recent decades have 
witnessed enormous growth in this 
research area. Face-based 
identification is always considered 
more expedient as compared to other 
biometric authentications owing to 
its uniqueness and wide acceptance. 
The major contribution of this work 
is twofold; firstly, it comprises an 
extension of manual thresholding 
feature-based face recognition 
approach to an automatic feature- 
based supervised learning face 
recognition. Secondly, various new 
feature sets are proposed and tested 
on several classifiers for 2, 3, 4, and 5 
persons. In addition, the use of slope 
features of facial components, such 
as the nose, right eye, left eye, and 
lips along with other conventional 
features for face recognition is also a 
unique contribution of this research. 
Multiple experiments were 
performed on the UMT face 
database. The results demonstrated a 
comparison of 5 different sets of 
feature-based approaches on 7 
classifiers using the metrics of time 
efficiency and accuracy. They also 


depicted that the proposed 
approaches achieve a percentage 
accuracy of up to 95.5%. 


Index Terms- face identification, face 
recognition, geometric features, 
supervised learning, Support Vector 
Machine (SVM) 


I. Introduction 


Despite the fact that numerous 
algorithms have been proposed in 
the literature, face recognition in 
unconstrained scenes remains a 
challenging problem due to changes 
in appearance caused by the 
variations in pose and expression. 
Still, face recognition has achieved 
mammoth popularity during the last 
few decades in computer vision, 


pattern recognition, neural 
networks, neuroscience, cognitive 
science, image processing, 


psychology, and physiology for 
good reasons. 


The major reasons for its wide 
acceptance are its simplicity, 
security, and authenticity. Face 
authentication works regardless of 
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any user input in the form of 
passwords, PIN codes, plastic cards, 


keys, tokens, and smart card 
authentication, which makes it 
simple. As far as security is 


concerned, all other methods which 
involve user input are no match for 
face recognition, as user input in any 
aforementioned form always comes 
with the risk of its misuse. This is 
exemplified by the use of passwords 
and PINs, which can be guessed or 
stolen. 


Moreover, the solicitude of 
enhanced physical security systems 
has enjoyed rapid growth in recent 
years due to a dramatic increase in 
the crime rate. Face recognition 
plays a pivotal role in this regard. It 
is widely used in numerous 
applications, such as automatic 
attendance systems, physical access 
control, surveillance control, people 
tagging, gaming, image search, and 
many more. This work proposes 
several feature-based approaches of 
face recognition containing some 
new features, namely slope table, 
along with random projection and 
regional properties. 


There are three major 
contributions of the current study. 
Firstly, it proposes a new approach 
for face recognition which extends 
the manual thresholding system [1] 
to a supervised learning algorithm. 
Secondly, five different sets of new 


features are suggested for feature 
matching. Thirdly, a detailed 
comparison of these approaches is 
provided using 7 existing classifiers 
on a dataset of 2, 3, 4, and 5 persons. 


The proposed work uses the 
slopes of different fiducial 
landmarks of facial components 
(nose, right eye, left eye, and lips) to 
build a slope table that is merged 
with other geometric and holistic 
features. Finally, features are fed to 
the classifiers for training and 
matching. The rest of the paper is 
organized as follows. In Section 2, a 
review of the related literature is 
presented. Section 3 gives an 
overview of the data set used for 
experiments. It also incorporates the 
proposed framework. Experimental 
evaluation is presented in Section 4. 
Finally, conclusions are drawn in 
Section 5. 


II. Literature Review 


Face recognition has been an 
active area of research for the last 
few decades. Initially, a semi- 
automated system was proposed for 
face recognition and only major 
features, such as ears, eyes, mouth, 
and nose were used for this purpose 
[2]. The algorithms of face 
recognition can be divided into two 
broad categories, namely feature- 
based techniques and holistic 
approaches [3]. Feature-based 
approaches initially identify and 
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extract the unique facial features 


and then compute geometric 
relationships among these 
distinctive features. Ultimately, 


some standard statistical pattern 
recognition techniques are applied 
to these computed geometric 
features for matching. 


Liu et al. [4] proposed a 
groundbreaking Gabor-Fisher 
Classifier (GFC) for face 


identification with 100% accuracy 
in face recognition using only 62 
features. Gabor-Fisher Classifier is 
vigorous to illumination and facial 
expression variations. Its feasibility 
was tested on the FERET database 
containing 600 FERET frontal 
facial images, corresponding to 200 
subjects, captured in manifold 
illuminations and distinct facial 
expressions. Although, these 
features have been categorized as 
the most successful face 
representations, which are highly 
rated dimensions for rapid 
extraction and accurate results. 


Shen et al. [5] proposed a 
framework of Gabor wavelets and 
general discriminant analysis 
(GDA) which extracts features from 
the whole face image. This 
algorithm was also tested on the 
same FERET database, as well as on 
the BANCA face database, with a 
97.5% recognition rate and 5.96% 
verification error rate, respectively. 
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Afterwards, in some variations of 
these approaches, Gabor features 
were replaced by a graph-matching 
strategy [6]-[8] and the histograms 
of oriented gradients [9]. 
Campadelli et al. [10] proposed an 
elastic bunch graph-based technique 
with automatic fiducial point 
localization. This technique 
computes 16 facial fiducial points 
and the head pose. Their proposed 
technique normalizes the image and 
characterizes facial fiducial points 
with its jets vector to extract the 
peculiar texture around jets. Finally, 
they measured the similarity among 
manifold jets to recognize the 
image. This approach was tested on 
a database of 2500 face foreground 
images, yielding an accuracy of 
93% for face recognition. They 
claimed to obtain the same 
performance as the existing elastic 
bunch graph approaches with 
manual graph placement. 


The discriminant elastic graph 
matching (DEGM) algorithm by 
Zafeiriou et al. [7] uses discriminant 
techniques at all phases of elastic 
graph matching for face 
verification. This conventional 
EGM was extended to a generalized 
EGM (G-EGM) in order to provide 
improved performance on globally 
warped faces [8]. It improved the 
robustness of node descriptors to 
globally misaligned faces and 
introduced warping-compensated 
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edges, with an incredible accuracy 
rate for face identification. Despite 
great accuracy, the technique has 
not gained adequate attention in 
many suitable real-time applications 
due to its expensive computational 
cost. Albiol et al. [9] applied HOG 
descriptors instead of Gabor 
features, which are more robust to 
illumination changes and 
insignificant displacements. As a 
result, higher accuracy was 
achieved for face verification in 
comparison to the existing Gabor- 
EBGM based approaches. 


Yang and Zhang [10] employed 
sparse representation-based 
classification (SRC) for face 
recognition. This algorithm codes 
the input testing images as a sparse 
linear combination of training 
samples. Local Gabor features of 
images were used for SRC and an 
algorithm was introduced to 
compute the associated Gabor 
occlusion dictionary which handles 
occluded face images with much 
higher recognition rates. Moreover, 
the computational cost of sparse 
coding was intensely reduced by 
compacting the occlusion 


dictionary. The major contribution 
of this work was to achieve high 
accuracy against the variations in 
lighting, 

occlusion. 


expression, pose, and 


Nevertheless, Gabor 
transformation is considered 
impractical for numerous real-time 
face recognition applications owing 
to its unusual time and space 
complexity. Yang et al. proposed a 
new and swift technique called 
monogenic binary coding for local 
feature extraction to combat this 
issue [11]. In this scheme, the 
original signal is decomposed into 
three harmonious elements, namely 
amplitude, orientation, and phase. 
Monogenic variation and features 
are encoded in each local region and 
pixel, respectively 


Another variation of Gabor 
transformation was proposed by 
Zhenhua, et al. which deals with 
inter-person and  intra-person 
variations in face images by using 
the robustness of ordinal measures, 
along with the distinctiveness of 
Gabor features [12]. The above 
authors also computed the 
histogram and initially generated 
feature vectors by concatenating 
statistical distributions, which were 
further refined by linear 
discriminant analysis. Finally, face 
recognition classifier was trained by 
practicing cascade learning and 
greedy block selection methods. 


In holistic approaches, global 
representations of the entire face are 
used for recognition instead of using 
some selective local features, such 
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as Eigenfaces [13], principal 
component analysis (PCA), linear 
discriminant analysis (LDA) [14], 
and independent component 
analysis-based approaches. These 
approaches take all images as 
matrices of intensity values. For 
every input image, a direct 
correlation comparison of this 
image is performed with all other 
faces in the database for face 
recognition. Holistic approaches 
can be further subdivided into two 
main categories, that is, statistical 
approaches and artificial 
intelligence (AI) based approaches. 
The simplest approach in statistical 
approaches uses intensity values of 
all facial pixels and performs 
correlation comparisons with all 
faces stored in the database. This 
approach is considered impractical 
because it is not only 
computationally expensive but 
works only in a restricted view, such 
as equal illumination, face 
orientation, and size [13]. Another 
shortcoming of such approaches is 
that it usually classifies data in a 
space of very high dimensionality. 
However, several other approaches 
have been proposed to combat the 
dimensionality problem by reducing 
the statistical dimensionality and 
procuring only meaningful feature 
dimensions for comparisons. Lu et 
al. also presented an unsupervised 


learning from raw pixels of facial 
images to learn hierarchical feature 
representation, instead of using 
Gabor features or any other 
conventional descriptor [14]. They 
used multiple feature dictionaries to 
represent different physical 
characteristics of the unique face 
region and computed multiple 
related features from them. 


Sirovich et al. [15] represented 
facial images economically by 
utilizing PCA [16], [17]. They used 
Eigen-pictures coordinate space to 
represent a face efficiently and 
reconstructed that face using a small 
collection of Eigen-pictures and 
their projections. Jian et al. [18] 
introduced the Spectro face method 
with 98% accuracy on Yale and 
Olivetti face databases. This method 
combines the Fourier transform and 
wavelet transform and proves that 
low-frequency images, decomposed 
using the wavelet transform, are less 
sensitive to the variations in facial 
expression. This method uses 
transformed images, invariant to 
scale, translation, and on-the-plane 
rotation. The outcome of the study 
showed that wavelet transform 
delivered robust representations in 
illumination variations and captured 
substantial facial features with low 
computational complexity. 


Following these considerations, 


approach for automatic feature an efficient scheme for face 
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recognition by applying wavelet 
sub-band representation was 
proposed by Zhang et al. [19]. Their 
method used two-dimensional (2D) 
wavelet sub-band coefficients to 
represent the face images. They also 
used a personalized classification 
method for recognition based on 
kernel associative memory (KAM) 
models. This method was tested on 
three standard face identification 


datasets, namely XM2VTS, 
FERET, and ORL. The results 
showed that this method 
outperformed the existing 


approaches in terms of accuracy. 
They also utilised KAM in 
combination with the Gabor wavelet 
network to construct a unified 
structure of Gabor wavelet 
associative memory (GWAM) [20]. 
This method was tested using three 
famous databases and reported a 
very high performance as compared 
to other techniques. 


Kwak et al. [21] systematically 
formed an enhancement of the 
generic independent component 
analysis called FICA by 
supplementing it with Fisher LDA 
and presenting it along with its 
underlying architecture. This 
method improved well-separated 
classification rates significantly and 
also dealt with illumination and 
facial expression invariants. 


It is generally accepted that low- 
resolution representation of face 
images reduces the performance of 
face recognition. To combat this 
issue, Huang et al. [22] proposed a 
super-resolution approach that maps 
coherent features, nonlinearly. It 
improves higher recognition of the 
nearest classifiers for the 
recognition of low-resolution face 
image. Initially, coherent subspaces 
are constructed by applying the 
canonical correlation analysis 
among low-resolution face images 
and the features of high-resolution 
based on PCA. Then, radial basis 
functions are used to build the 
nonlinear mapping between high 
and low-resolution features. Lastly, 
super-resolved coherent features are 
constructed and fed to a simple NN 
classifier for face identification. 


The computational complexity 
of these techniques was further 
reduced by Azad et al. [23]. In their 
proposed face recognition scheme, 
instead of taking all pixels of the 
whole image only the face was 
identified and PCA technique was 
applied to that particular region in 
order to extract the vector features. 
Then, multiclass support vector 
machine (SVM) classifiers were 
applied to these features for 
identification and classification. 
This technique achieved a 98.45% 
accuracy rate on the FEI database. 
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Zhang et al. also presented a 
novel and efficient low-resolution 
face recognition algorithm known 
as coupled marginal discriminant 
mappings (CMDM) [24]. CMDM 
makes data points in the low and the 
original high-resolution features and 
compels them into a unified space. 
It constructs samples closely from 
the same class and separately from a 
distinct class with a great margin. 
This method avoids the dimensional 
mismatch problem and fills the data 
gap of different resolutions. CMDM 
achieved outstanding performance 
on the AR and FERET face 
databases as compared to other up- 
to-date lower solution face 
recognition techniques. 


In 2016, Yihang et al. [25] 
proposed an approach that deals 
with multimodal face and ear 
recognition in unconstrained scenes 
by using the fusion of both spherical 
depth map and the spherical texture 
map characteristics. They initially 
used sparse representation for 
recognition and then the final results 
were refined using Bayesian 
decision-level fusion. Recently, 
Yongjie et al. [26] investigated low- 
resolution face recognition with one 
image per person. A cluster-based 


unsupervised clustering. The 
between-class scatter matrices are 
regularized with inter-cluster scatter 
matrices and within-class matrices 
with intra-cluster scatter matrices. 
As this approach allows exploiting 
more variations from the narrow 
training samples, so, the overfitting 
and singularity problems are dealt 
effectively. The effectiveness of this 
algorithm has been proven by 
extensive testing on low-resolution 


face images captured in both 
controlled and uncontrolled 
environments. 


Feature-based approaches using 
classifiers show promising results 


and the research community 
remains convinced that these 
learning-based techniques can 


provide better results in the future. 
This paper is an extension of the 
existing manual threshold feature- 
based approach to automatic 
supervised learning-based approach 
for face recognition [1]. In this 
work, different sets of new features 
are suggested for feature matching. 
Each approach was tested on seven 
existing classifiers and the results 
are reported in Section 5. These 
classifiers include neural network 
(Multilayer Perceptron (MLP)), 


regularized simultaneous Naive Bayes, J48, SVM, AdaBoost, 
discriminant analysis (C-RSDA) Decision Tree, Decision Table, and 
method was proposed based on Nearest-Neighbor-like Algorithm 
SDA. C-RSDA computes cluster- (NNge) [27+[29]. Features 
based scatter matrices from calculated in approach 5 
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outperformed all other approaches 
in terms of accuracy with more than 
90% true positive results. Sagonas 
et al. [30] explored the problem of 
joints and facial contaminated data. 
They proposed the JIVE and Robust 
JIVE techniques where 1-norm was 
employed on data. They performed 
extensive 2D and 3D face age 
progression experiments where they 
found the RJIVE method as the 
most reliable technique, among 
others. 


Wang et al. [31] offered a 
recurrent face aging (RFA) GRU 
framework with triple layers. GRU 
triple layer better determines face 
identification than bi-layer with a 
combination of RNNs. However, 
during their testing, there was a lack 
of age input in their experiments 
which may be catered to in future 
work. Singhal and Srivastava [32] 
carried out a survey in which 
different databases and approaches 
were explored to rectify the 
problems faced during the face 
identification process. They 
explored many facial databases, 
real-time images, and videos, so that 
effective machine-learning 
approaches could be applied to 
improve facial identification 
processes. Zhu et al. [33] proposed 
a cascaded CNN approach instead 


‘https://sites.google.com/site/farooq1us/dat 
aset 


of using traditional methods. They 
used a face profiling approach to use 
abundant samples during training 
and a cost function OWPDC was 
also introduced to improve the 
parameters' priority. Mollahosseini 
et al. [34] trained neural networks 
with multiple scenarios to 
differentiate their performance 
between labelled and non-labelled 
data. As per their observation, noise 
estimation may apply to a few 
expressions. 


Hang et al. proposed an 
approach in which detection traced 
faces in a frame. Where, face is 
aligned according to the canonical 
view to crop it with the normal size 
of the pixels [35]. 


Govind et al. observed that 
maximum performance percentage 
could be gained by using masked 
faces in training classifiers. Based 
on empirical analysis, it can be 
improved with new values and 
variables [36]. Marcus et al. also 
conducted an analysis on facial 
recognition application [37]. 

HI. Methodology 

The proposed approaches were 
evaluated on a self-generated UMT 
face dataset! consisting of 5 


subjects. The UMT face database 
(DB) contains 150 face images, with 
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30 face images of each subject 
marking the variations in pose and 
expression. This data was used with 
10-fold cross-validation. Fig. 1 and 


Fig. 2 show different frames that 
contain various expressions and 
different poses. 


Fig. 1. Key frames from UMT database 


BoRMaN was used to detect 
face images and compute the 
coordinates of 22 facial points 
which act as inputs to the proposed 
algorithm [38]. Table I provides a 
vivid description of the dataset. The 
current authors assumed a single 
camera setup to capture static 
colored images of 500%*375 
resolutions with various poses and 
expressions, although illumination 
remained constant. 
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In this study, various new 
feature sets are proposed and used 
with existing classifiers for face 
recognition, instead of using a 
manual threshold system for 
matching [1]. The proposed 
framework works in two major 
phases, namely training and testing. 
Each phase consists of a sequence of 
four steps. The initial three steps of 
both phases are similar, starting 
from the identification and 
localization of the face region in the 
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2D face image. The second step is 
the extraction of distinctive facial 
components, such as mouth, eyes, 
nose, as well as other facial fiducial 
points calculated by using 
BoRMAN. In the third step the 
| 


geometrical relationship among 


these facial points is estimated. 
Thus, a vector of geometric features 
from the input facial image is shown 
below. 


Fig. 2. UMT database with pose and expression variations 


FV(I;) = FeaturesCalculation(IF(U)) (1) 


If (li) is the face pixels of the 
fifth training image for Approach 1, 
Approach 2, and Approach 3, FV 
(li) is the feature vector of the fifth 
training image. The proposed 
methodology uses five different 
approaches to calculate features and 
these features are used with 7 
classifiers including Neural 
Network (Multilayer Perceptron 
(MLP)), Naive Bayes, SVM, J48, 


AdaBoost, Decision Tree, Decision 
Table, and Nearest-Neighbor-like 
Algorithm (NNge) [39], [40]. In the 
final step of the training phase, these 
extracted features are employed by 
the classifiers to train it as shown 
below. 


Trained Classifier = 
Classifier(FV (1;) (2) 


where 1 <i< N, N = 5060 
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Table I 

UMT Face Dataset Description 
Properties Description 
No. of subjects | 5 
No. of 150 
images/Videos | Static 
Videos/Static Single 
Multiple face Color 
Gray Various Pose 
Scale/Color Various 
Face Expression 
Pose/Various Constant 
Pose Illumination 
Facial/Various 500*375 
Expression 
Illumination 
Resolution 


O O @ 


Image Facial Point 
Input Detection 


Feature 
Extraction 


On the other hand, in the final 
step of the face recognition phase, 
the feature vector (FV) of a given 
test image a test has employed by 
the trained classifier for matching. 
The trained classifier categorizes 
the image into 6 categories, namely 
Person 1, Person 2, Person 3, Person 
4, Person 5, and no match. 


Categoryqa<j<e) = 
Trained Classifier (FV (lest) (3) 
Section 5 delineates the 


discussion regarding the results of 
all classifiers. All aforementioned 
strides are well explained in the sub- 
sections below. Fig. 3 shows the 
block diagram of the proposed 


algorithm. 
Danto 
li Ea. 


Classifier 


Fig. 3. Block diagram of proposed framework 


A. Facial Fiducial Points Detection 


The proposed framework 
employed BoRMaN to detect face 
images and extract the facial 
fiducial points for each frame. 
BoRMaN iteratively uses support 
vector regression (SVR) and local 
appearance-based features to 
provide an initial prediction of 22 
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points of facial components, 
including nose, lips, chin, mouth, 
eyebrows, and eyes. The 


distribution of these 22 facial 
fiducial points is shown in Fig. 4. 
Ten points are used to mark both 
eyes, three collectively for the nose, 
four for lips, four points represent 
eyebrows, and one point is used to 
spot the chin. Then, it computes 
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ratios, angles, and areas of different 
invariant features using these facial 
points 


fiducial and formulates 


several approaches to build various 
novel sets of features used in the 
testing and training of classifiers. 


Fig 4. Facial fiducial points are detected and shown on the facial image at 
the right side by applying BoRMaN on the left image 


B. Features Reckoning 


Facial fiducial points computed 
in the previous phase were used to 
calculate different features, using 
several approaches given to the 
classifiers for training and testing. 
Identification labels of these 
features were passed as input to a 
classifier for training and testing 
purposes. Face recognition is based 
on the output of the classifiers. If the 
values of the features determine the 
difference as below the threshold 
during the testing phase, then as per 
the matching frame, the classifiers 
are automatically classified. The 
proposed five different methods to 
extract features from 2D frames are 
explained below. The first four 
techniques differ by feature 


oo | 


collection, whereas the fifth 
approach is a combination of 
Approach 4 in combination with the 
slope table. 


1) Approach 1: Random 
projection features (RPF) 


This is a holistic feature 
extraction approach that is time 
efficient and based on features 
extracted from the facial image and 
sparse matrix. Initially, features are 
extracted effectively using a sparse 
measurement matrix and then 
multiplied with the image features. 
This model employs random 
projection which preserves the 
structure of an image. A random 
matrix Z €Z‘"*™ whose rows have 
unit length projects data from the 
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high-dimensional image 
space xeZ™ to a low-dimensional 
space veZ",v = Zx Fig 6 
illustrates the feature-based face 
recognition model using random 
projection. 

if val < 0 then val = —1 


if val > 0 then val = 1 
if val = 0 then val = 0 


(4) 

In this work, a sparse random 
measurement matrix adopted 
efficient dimensionality reduction 
with entries, as shown in Eq. 4. 


Zj = 


2) Approach 2: Histogram of 
oriented gradient features 
(HOGF) 


This appearance and shape- 
based approach calculates the 


Image Input Processing 


5 


T Sprase 
fas} 1 Measurement 
i> | Matrix 
ay 


features based on the gradients of 
the given facial image. The gradient 
is a generalization of the usual 
concept of derivative to the function 
of two variables of the given image 
represented as F(f,,f,), whose 
components are derivatives in 
horizontal and vertical directions. It 
is, thus, a vector-valued function. 
Initially, the gradient vector of the 
given image is calculated as shown 
in Eq. 5. 


way = = 2) 


I’ Dy (5) 
The gradient vector was used to 
compute a magnitude of 


gradients using Eq. 6. 


M= JR +f) (6) 
Classifier 
I Ë lad 
á ` a 
eee = 
oo | 


‘ CA > 


Fig. 5. Explaining random projection that converts image features into the 
compressed vector using sparse measurement matrix 


In the next step, a histogram of 
this magnitude is calculated to 
divide the entire range of values into 


frequency. The histogram of 
magnitude is also considered in 
stating distinctive values for each 


a series of intervals. This reduces image. The overall process of 
the feature vector size based on feature calculation using a 
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histogram of the oriented gradient is 
illustrated in Fig 6. 


Cell Histogram Generation 


l 


9-Bin histogram 


Histogram 


E- 


v 

= 
s 
5 
o 
£ 
o] 


Histogram Buffer SRAM 


Fig. 6. Feature extraction by 
using a facial image as input 


3) Approach 3: regional 
properties features (RePF) 


In this approach, the regional 
properties template is applied over 
image pixel values. A regional 
property template is a set of 
different scalar regional properties, 
such as area, Euler number, 
eccentricity, extent, major axis 
length, orientation, solidity, Equiv- 
diameter, max intensity, minor axis 
length, mean intensity, and minor 
intensity. Where, area identifies the 
actual number of pixels in the region 
and the Euler number specifies the 
total number of objects in the image 
subtracting the total number of 
holes. The proposed regional 
property template uses 8- 


connectivity to compute the Euler 
number 


measurement. Equiv- 


Histogram Normalization 


9-Bin Histogram 


9 Dividers 


Cell Feature Accumulation 


HOG Feature 


DEMUXER 


al 
Block Accumulation 


computing histogram of oriented gradients 


Diameter determines the diameter 
of a circle with the same area as the 
region and is calculated using Eq. 7. 
Area 
ace 
The eccentricity property is used 
to measure the eccentricity of the 
ellipse that has the same second 
moments as the region. Eccentricity 
is the ratio between the major axis 
length and the distance between the 
foci of the ellipse. Its value is 
between 0 and 1. In the regional 
properties template, the ratio of 
pixels in the region to pixels in the 
total bounding box is computed 
using Eq. 8. 


(7) 


Area 


Extent = (8) 


Area(boundingbox) 
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The major and minor length axis 
specifies the length of the major and 
minor axes of the ellipse that have 
the same normalized second central 
moments as the region, respectively. 
Orientation states the angle between 


ellipse that has the same second 
moments as the region. The solidity 
of the region is calculated by the 
proportion of pixels in the convex 
hull, as shown below in Eq. 9. 


Area 


: : : Solidity = ———— 9 
the x-axis and the major axis of the Y = ConvexArea 0) 
Facial Regional Connected Regional i 
Image > EO EES Components Features Classifiers 


Fig. 7. Feature extraction through regional properties 


4) Approach 4: Ratio and angles 
of facial image (RAF) 


In this approach, a geometric 
feature set is constructed a 
combination of the ratios of length 
to width, angles among triangles 
from the facial points, and the ratios 
of the areas of triangles as given by 
BoRMaN. After the detection of 
these facial points, the ratio of 
length to width of lips and eyes, the 
ratio of the area of triangles, and the 
angle of different triangles are 
computed for face recognition. The 
details of these features are 
presented in this section. 


The ratios of length to width of 
these points (left and right eyes and 
upper and lower lips) are computed 
instead of using the length or width 
of lips, left eye, and right eye to 
ensure that the scale is invariant. 
The length of the left eye and right 
eye or the width of the left eye and 
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right eye alone could be taken as a 
feature but could not be considered 
as a scale constant. This is why the 
ratio of the length and width of lips 
and eyes (both left and right) is 
considered as a feature to calculate 
its scale constant, as shown in Eq. 
10. 


Ratio = LiPSiengtn/LiPSwiatn (10) 


Where, lip length is calculated using 
the Euclidean distance formula 
between points 5 (xs, ys) and 6 (xo, 
yo) marked on Fig 8(a), as shown in 
Eq. 10. 


LipStength = 

V (6 = %5)? + (Y6 — Ys)? (11) 
Similarly, lip width is also 
calculated using the Euclidean 


distance formula between points 7 
(x7; y7) and 8 (xs; ys) marked in Fig 
8(a), as shown in Eq. 12. 


LipSwiath = y ((x6 — Xs)? + (Y6 — Ys )? 
(12) 
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The ratio of length to width of 
the left eye in Fig. 8(a) is computed 
with eye length and width. Eye 
length and width are calculated by 
the Euclidean distance formula 


(a) 


Fig. 8. Ratios of angles of triangles 


The proposed method uses the 
ratios of the areas of triangles as 
ratio is a scaled invariant. The three 
triangles, marked in Fig. 8 (b), are 
used to calculate the ratios of the 
triangles A 2; 6; 17; A 2,6,21 and A 
18, 20, 17, using the formula given 
in Eq. 13. 


Ratio _ Arearriangle1 
AreaOfTriangle 7 Arearriangle1 


(13) 
The areas of triangles are 
computed by using Hero’s formula, 
as shown in Eq.14. 


using points 1 (x1; y1), 2 (x2; y2) and 
3 (x3; y3), 4 (x4; y4), respectively. 
The ratio of length to width of the 
right eye is also computed in a 
similar manner. 


Arearriangle = 
(s(s — a)(s — b)(s —c)) 
(14) 
Where, a, b, c is the length of the 
sides of the triangle computed by 
Euclidean distance and s is the 
average length of the three sides of 
the triangle calculated in Eq. 15. 
_ (at+b+c) 
=e (15) 
The values of different triangles’ 
angles marked in Fig. 8 b are used as 
features for face recognition. These 
angles are calculated using the 
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fundamental cosine formula described 
below in Eqs. 16-18. 


a = cos\((b? + c? — a?)/2bc (16) 
B =cos"\((a? + c? — b*)/2ac (17) 
y = cos-1((a? + b? — c?) /2ab (18) 


The angles of eight triangles (< 
2,6,17, < 2,6,21, < 2,6,22, <17,20,22, 
< 6,2,17, < 6,2,21, < 6,2,22 and < 
17,18,22) among the points marked in 
Fig. 8(a, b) are computed using the 
formulas in Eqs. [16-18]. 


5) Approach 5: Ratios, angles, and 
slopes of facial image (RASF) 


In this approach, the same 
geometric features are used as in 
RAF discussed in Section 3.2.4. 
However, this approach also uses 
the slope features of a facial image 
along with the ratios of length to 
width, area, and angles. Ratios and 
angles are computed using Eqs. [13- 
18]. For slope features, the proposed 
framework introduces a 
groundbreaking technique, where a 
slope table is used as a feature along 
with the other multiple features for 
face recognition. A slope table is 
constructed by calculating the 
slopes of the different fiducial 


Initially, centroids of each facial 
component (obtained from 
BoRMaN) are computed. Then, 
Euclidean distance is computed by 
using these centroids and the rest of 
the points. Finally, slope tables are 
computed for the left and right eyes 
by calculating the slope of each 
point of the left eye and right eye, 
respectively. Similarly, slope tables 
of lips and nose are also computed 
by employing their respective 
fiducial points. Slope computation 
is shown in Fig. 9. The centroid of a 
facial component is calculated using 
Eq. 19. 

Èx; LY; 
c= ($ 2) 


OR (19) 

In the above equation, (X;, Y;) 
are the two coordinates of the fifth 
point of a facial component. P, and 
P, are Euclidean distances of the 
boundary point to the centroid along 
the x-axis and y-axis, respectively. P 
represents the line from the 
boundary point to the centroid, 
whereas P, is the x-component and 
P, is the y-component of the line P. 
P, and P, are used to compute the 
slope as shown below. 


landmarks of facial components m = tang = > (20) 
(left eye, right eye, nose, and lips). E 
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dP 1, = P1,-c(x) ## difference of x- 


coordinate of point (P1) with centroid’s 
x-coorcinate i.e. dP1, =3-3.25 = -0.25 


dP1, = P1,-c(y) ## difference of y- 
coordinate of point (P1) with centroid’s 


y-coordinate i.e. dP1, = 4-7 = -3 


feng Irsg 
eiz) es) 


woe 


ew eis) ei 


Fig. 9. Slope table computation 


Several classifiers are trained and 
tested using these facial features. 


C. Training and Testing 


All the features calculated in 
Section 4.2 were employed as input 
to seven classifiers to train them. 
Then, the features of the test image 
were fed to trained classifiers for 
face matching. These classifiers 
included SVM, AdaBoost, Nearest- 
Neighbor-like algorithm (NNge), 
Naive Bayes, Decision Table, J48, 
and Neural Network (Multilayer 
Perceptron (MLP)) [38]. In the 
matching phase, classifiers 
automatically categorized the input 
image into 6 categories in the 


¥ 
imag 


eis 


extreme case (for 5-person testing) 
as person 1, 2, 3, 4, and 5, or not 
matched. Similarly, for the 4-person 
problem, classifiers categorized into 
5 categories as person 1, 2, 3, and 4, 
or not matched, and so on. 


IV. Results and Discussion 


The proposed approach was 
tested on seven classifiers for 
accuracy and scalability on a wide 


variety of images using five 
features-based approaches. The 
results, in terms of percentage 


accuracy (90% percentage split as 
90:10 training: testing respectively) 
and time efficiency, are shown in 
Table II and Table III, respectively. 
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Table II and Table II show the 
results for features computed using 
all the above elaborated approaches 
applied on different numbers of 
persons. To test these approaches, 
the data set was divided into sub- 
parts. Hence, all the approaches 
were tested on 2 persons, then 3, 4, 
and 5 persons, which helped to 
understand the dimensionality and 
generalization of the current 
approaches. 


Table II depicts the performance 
in terms of percentage accuracy of 
Approach 5 (RASF) using NNge as 
an effective approach than the rest 
of the four techniques and six 
classifiers for 2 persons. However, 
for persons 3 and 5, RASF with 
SVM classifiers performs better. 
For person 4, the best approach 
remains RASF but with MLP 
classifiers. Moreover, the average 
percentage accuracy of Approach 4 
(RAF) using all classifiers for 
persons 2 and 5 is remarkably better 
than the rest of the approaches, 
while the average percentage 
accuracy of RASF for persons 3 and 
4 is best among all approaches. 
Hence, it can be summarized that 
RAF and RASF show a clear 


generalized behavior that 
outperforms all the other 
approaches, irrespective of the 


number of persons. The possible 


these approaches use geometrical 
features and the motion vector- 
based interpolation technique. The 
performance of RASF is_ best 
because of the addition of a slope 
feature in the features of the RAF 
approach. So, it can be concluded 
that the features of the slope table 
are working as very useful face 
recognition features, achieving the 
highest accuracy in all the sub-parts 
of the data set. 


The average percentage 
accuracy of classifiers based on all 
approaches shows that the Support 
Vector Machine (SVM) classifiers 
outperform the rest of the classifiers 
including Neural Network 
(Multilayer Perceptron (MLP)), 
Decision Table, AdaBoost, Naive 
Bayes, J48 and NNge. It is because 
SVM has a regularization parameter 
which helps in avoiding overfitting 
and uses the kernel trick, which 
builds expert knowledge about the 
problem via engineering the kernel. 
The second and third-best average 
percentage accuracy are shown by a 
neural network (Multilayer 
Perceptron (MLP)) and Naive 
Bayes classifiers, respectively (see 
Table I). The grounds of improved 
percentage accuracy of these 
classifiers must be the ability to 
handle the missing values which is 
known as completeness property. 


reasons for the better performance This property assures the 
of RAF and RASF are that both of scrutinizing of ll possible 
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combinations of condition values. 
Similarly, RAF showed the second- 
best percentage accuracy for 3- and 
4-person problems and RASF 
reported second-best results for 2- 
and 5-person problems, on average, 
of all classifiers. 


Fig 10 (a-e) shows the pictorial 
representations of the results of 


multiple applied approaches namely 


Random Projection (RPF), 
Histogram of Oriented Gradients 
(HOGF), Regional Properties 


(RePF), Ratio and Angles (RAF), 
and Ratio, Angles, and Slope 
(RASF) in terms of percentage 
accuracy for 2-persons, 3-persons, 
4-persons, and 5-persons problems. 


Table II 
Results of Random Projection (RPF), Histogram of Oriented Gradients 
(HOGF), Regional Properties (RePF), Ratio and Angles (RAF), and Ratio, 
Angles, and Slope in terms of Percentage Accuracy along with 10-fold 
Cross Validation 


Person SVM AdaBoost _ NNge N-B__D-Table_ _J48 _MLP _ _Average 
1 71.6 75 71.6 76 78 78 71.6 74.5 
2 68.8 67.5 60.7 63.7 75 76.7 68 68.6 
3 56.3 61.5 58.6 59.6 68 70.2 64 62.6 
RPF 4 52.8 58.7 51.6 52.8 64 68.6 56 57.7 
1 80.3 81.6 81.6 79.7 81.7 85 81.6 81.6 
2 78.5 76.6 76.5 70.8 72.2 72.2 72.2 74.1 
3 75.5 64.7 62.5 68.5 68 66.7 67.7 67.6 
HOGF 4 63.3 58 58 62.7 64 58 59.3 60.4 
1 86.6 81.7 80 81.67 83.3 80 81.6 82.1 
2 79 76.6 73.5 74.5 68.9 78 73.3 74.8 
3 63.8 68.5 68.6 67.5 55.8 68 68.7 65.8 
RePF 4 58.6 60 58.3 62.7 53.3 6l 56.6 58.6 
1 90.6 91.6 91.6 95 91.6 81.6 93.3 90.7 
2 91.1 86.6 93.3 94.4 78.8 86.6 94.4 89.3 
3 85 75.8 85 84.1 67.5 75 88.3 80.1 
RAF 4 90.6 63.6 82.6 82 62 68 91.3 TIal 
1 90.3 91.6 93.3 91.6 91.6 81.6 88.3 89.7 
2 95.5 86.6 91.1 87.7 82.8 86.6 94.4 89.3 
3 90 75.7 88.6 78.3 70.3 79.6 90.8 81.9 
RASF 4 92 68.8 79.3 75.3 63.6 68 90.6 76.8 
Average 78 73.5 15:3 75.4 72 74.4 77.6 


Naive Bayes=N-B, Decision table=D-Table 
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Table 3 shows the results for the 
time taken in building the model for 
the features computed and tested 
using the above mentioned 
approaches. It is clear from the 
results that Naive Bayes is the most 
time-efficient approach and 
AdaBoost is the second most time- 


efficient classifier. On the other 
hand, neural network (MLP) 
consumes the most time out of all 
the classifiers, whereas SVM 
overtakes all the other classifiers by 
achieving the highest accuracy and 
does not compromise time 
efficiency either. 


Table III 
Time taken to build a model for Random Projection (RPF), Histogram of 
Oriented Gradients (HOGF), Regional Properties (RePF), Ratio and 
Angles (RAF), and Ratio, Angles, and Slope (RASF) with Different 


Classifies 

Person SVM AdaBoost NNge N-B D-Table J48 MLP 

1 10 30 10 0 40 10 3680 

RPF 2 30 10 30 0 40 40 5330 
3 40 10 30 0 60 50 7390 

4 80 10 60 10 80 80 10050 

1 10 10 10 0 10 0 90 

HOGF 2 20 0 0 0 10 0 150 
3 40 0 10 0 20 10 620 

4 50 0 0 0 20 0 350 

1 10 10 10 10 40 10 2150 

RePF 2 20 30 10 0 60 10 3080 
3 30 10 30 0 60 20 4090 

4 60 10 40 0 110 20 5830 

1 10 30 10 0 60 10 4470 

RAF 2 20 50 20 0 120 10 6440 
3 40 10 40 0 160 30 8760 

4 70 10 60 0 270 50 9940 

1 60 130 220 40 330 80 2190 

RASF 2 30 40 20 0 80 20 3030 
3 40 10 30 0 100 20 4110 

4 70 10 30 0 140 80 5540 

Average 37 21 33.5 3 90.5 27.5 4364.5 


Naive Bayes=N-B, Decision table=D-Table 


Fig. 11(a-e) shows the pictorial 
representations of the results of the 
approaches used namely Random 
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Projection (RPF), Histogram of 
Oriented Gradients | (HOGF), 
Regional Properties (RePF), and 
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Ratio and Angles (RAF), Ratio, persons, 4-persons, and 5-persons 
Angles, and Slope (RASF) in terms problems on the seven 
of execution time for 2-persons, 3- aforementioned classifiers. 


HOGF 
100 


RPF . 
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Fig. 10. Graph showing the results of Random Projection (RPF), Histogram 
of Oriented Gradients (HOGF), Regional Properties (RePF), Ratio and 
Angles (RAF), Ratio, Angles, and Slope (RASF) in terms of percentage 


accuracy with 10-fold cross-validation 
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Fig. 11. Graph showing the pictorial representations of the results of 
Random Projection (RPF), Histogram of Oriented Gradients (HOGF), 
Regional Properties (RPF), Ratio and Angles (RAF), and Ratio, Angles, and 
Slope (RASF) in terms of execution time for 2-persons, 3-persons, 4- 
persons, and 5-persons problems on the seven aforementioned classifiers 
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A. Conclusion and Future Work 


The proposed framework 
experiments with various features 
for face identification problems and 
suggests the use of BoRMaN and 
slopes. As is clearly shown by the 
results that feature-based face 
identification using BoRMaN and 
the slopes is the best approach 
among all the other discussed 
approaches. BoRMaN returns 22 
constant facial features used to 
calculate features like ratios and 
angles. This framework is scale- 
invariant, however, it does not 
perform well in dim light. In the 
future, a system shoud be developed 
that also works in gloomy light. 
Besides other extensions, the 
availability of a large dataset 
repository is an important issue. 
Hence, more repositories should 
also be developed. 
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